AITopics | off-policy deep reinforcement learning

Regularized Anderson Acceleration for Off-Policy Deep Reinforcement Learning

Neural Information Processing SystemsDec-25-2025, 22:22:10 GMT

Model-free deep reinforcement learning (RL) algorithms have been widely used for a range of complex control tasks. However, slow convergence and sample inefficiency remain challenging problems in RL, especially when handling continuous and high-dimensional state spaces. To tackle this problem, we propose a general acceleration method for model-free, off-policy deep RL algorithms by drawing the idea underlying regularized Anderson acceleration (RAA), which is an effective approach to accelerating the solving of fixed point problems with perturbations. Specifically, we first explain how policy iteration can be applied directly with Anderson acceleration. Then we extend RAA to the case of deep RL by introducing a regularization term to control the impact of perturbation induced by function approximation errors. We further propose two strategies, i.e., progressive update and adaptive restart, to enhance the performance. The effectiveness of our method is evaluated on a variety of benchmark tasks, including Atari 2600 and MuJoCo. Experimental results show that our approach substantially improves both the learning speed and final performance of state-of-the-art deep RL algorithms.

name change, off-policy deep reinforcement learning, regularized anderson acceleration, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Reviews: Regularized Anderson Acceleration for Off-Policy Deep Reinforcement Learning

Neural Information Processing SystemsJan-26-2025, 17:24:12 GMT

The main contribution of this paper is to apply Anderson acceleration to the setting of deep reinforcement learning. The authors first propose a regularized form of Anderson acceleration, and then show how it can be applied to two practical deep RL algorithms: DQN and TD3. Originality: This paper falls under the vein of applying existing techniques to a novel domain. While the idea of introducing Anderson acceleration to the context of RL is not new, as the authors mention, it has not been applied to deep RL methods. While the originality is somewhat limited in this aspect, developing a practical and functional improvement for deep RL algorithms is not trivial.

anderson acceleration, off-policy deep reinforcement learning, regularized anderson acceleration, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Reviews: Regularized Anderson Acceleration for Off-Policy Deep Reinforcement Learning

Neural Information Processing SystemsJan-26-2025, 17:24:01 GMT

This work is an interesting contribution to deep RL that considers using Anderson acceleration to improve off-policy TD based algorithms. The approach is supported by some theory as well as experiments on standard benchmark problems. Overall, reviewers like the paper and agree it should be accepted.

off-policy deep reinforcement learning, regularized anderson acceleration

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)

Add feedback

Dynamics of Resource Allocation in O-RANs: An In-depth Exploration of On-Policy and Off-Policy Deep Reinforcement Learning for Real-Time Applications

Mehdaoui, Manal, Abouaomar, Amine

arXiv.org Artificial IntelligenceNov-17-2024

Deep Reinforcement Learning (DRL) is a powerful tool used for addressing complex challenges in mobile networks. This paper investigates the application of two DRL models, on-policy and off-policy, in the field of resource allocation for Open Radio Access Networks (O-RAN). The on-policy model is the Proximal Policy Optimization (PPO), and the off-policy model is the Sample Efficient Actor-Critic with Experience Replay (ACER), which focuses on resolving the challenges of resource allocation associated with a Quality of Service (QoS) application that has strict requirements. Motivated by the original work of Nessrine Hammami and Kim Khoa Nguyen, this study is a replication to validate and prove the findings. Both PPO and ACER are used within the same experimental setup to assess their performance in a scenario of latency-sensitive and latency-tolerant users and compare them. The aim is to verify the efficacy of on-policy and off-policy DRL models in the context of O-RAN resource allocation. Results from this replication contribute to the ongoing scientific research and offer insights into the reproducibility and generalizability of the original research. This analysis reaffirms that both on-policy and off-policy DRL models have better performance than greedy algorithms in O-RAN settings. In addition, it confirms the original observations that the on-policy model (PPO) gives a favorable balance between energy consumption and user latency, while the off-policy model (ACER) shows a faster convergence. These findings give good insights to optimize resource allocation strategies in O-RANs. Index Terms: 5G, O-RAN, resource allocation, ML, DRL, PPO, ACER.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

2412.01839

Country:

North America > United States > Texas > Travis County > Austin (0.04)
Asia > Taiwan (0.04)
Africa > Middle East > Morocco (0.04)

Genre: Research Report (1.00)

Industry:

Energy (0.91)
Telecommunications (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Regularized Anderson Acceleration for Off-Policy Deep Reinforcement Learning

Neural Information Processing SystemsOct-10-2024, 19:56:17 GMT

Model-free deep reinforcement learning (RL) algorithms have been widely used for a range of complex control tasks. However, slow convergence and sample inefficiency remain challenging problems in RL, especially when handling continuous and high-dimensional state spaces. To tackle this problem, we propose a general acceleration method for model-free, off-policy deep RL algorithms by drawing the idea underlying regularized Anderson acceleration (RAA), which is an effective approach to accelerating the solving of fixed point problems with perturbations. Specifically, we first explain how policy iteration can be applied directly with Anderson acceleration. Then we extend RAA to the case of deep RL by introducing a regularization term to control the impact of perturbation induced by function approximation errors. We further propose two strategies, i.e., progressive update and adaptive restart, to enhance the performance.

algorithm, off-policy deep reinforcement learning, regularized anderson acceleration, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Regularized Anderson Acceleration for Off-Policy Deep Reinforcement Learning

Shi, Wenjie, Song, Shiji, Wu, Hui, Hsu, Ya-Chu, Wu, Cheng, Huang, Gao

Neural Information Processing SystemsMar-19-2020, 00:47:49 GMT

Model-free deep reinforcement learning (RL) algorithms have been widely used for a range of complex control tasks. However, slow convergence and sample inefficiency remain challenging problems in RL, especially when handling continuous and high-dimensional state spaces. To tackle this problem, we propose a general acceleration method for model-free, off-policy deep RL algorithms by drawing the idea underlying regularized Anderson acceleration (RAA), which is an effective approach to accelerating the solving of fixed point problems with perturbations. Specifically, we first explain how policy iteration can be applied directly with Anderson acceleration. Then we extend RAA to the case of deep RL by introducing a regularization term to control the impact of perturbation induced by function approximation errors.

algorithm, off-policy deep reinforcement learning, regularized anderson acceleration, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Filters

Collaborating Authors

off-policy deep reinforcement learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Regularized Anderson Acceleration for Off-Policy Deep Reinforcement Learning

Reviews: Regularized Anderson Acceleration for Off-Policy Deep Reinforcement Learning

Reviews: Regularized Anderson Acceleration for Off-Policy Deep Reinforcement Learning

Dynamics of Resource Allocation in O-RANs: An In-depth Exploration of On-Policy and Off-Policy Deep Reinforcement Learning for Real-Time Applications

Regularized Anderson Acceleration for Off-Policy Deep Reinforcement Learning

Regularized Anderson Acceleration for Off-Policy Deep Reinforcement Learning